layer network
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
Deep one-gate per layer networks with skip connections are universal classifiers
Raul Rojas Department of Mathemanullcs and Stanullsnullcs University of Nevada Reno October 2025 Abstract This paper shows how a mulnulllayer perceptron with two hidden layers, which has been designed to classify two classes of data points, can easily be transformed into a deep neural network with one - gate layers and skip connecnullons. As shown in [1], deep one - gate per layer networks can perfectly separate points belonging to two classes in an n - dimensional space. Here, I present an alternanullve proof that may be easier to understand. This proof shows that classical neural networks that separate two classes can be transformed into deep one - gate - per - layer networks with skip connecnullons. A perceptron receives a vector input and divides input space into two subspaces: the posinullve and neganullve half - spaces (Figure 1a).
- North America > United States > Nevada > Washoe County > Reno (0.25)
- Asia > Singapore (0.05)
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
Wang, Kevin, Javali, Ishaan, Bortkiewicz, Michał, Trzciński, Tomasz, Eysenbach, Benjamin
Scaling up self-supervised learning has driven breakthroughs in language and vision, yet comparable progress has remained elusive in reinforcement learning (RL). In this paper, we study building blocks for self-supervised RL that unlock substantial improvements in scalability, with network depth serving as a critical factor. Whereas most RL papers in recent years have relied on shallow architectures (around 2 - 5 layers), we demonstrate that increasing the depth up to 1024 layers can significantly boost performance. Our experiments are conducted in an unsupervised goal-conditioned setting, where no demonstrations or rewards are provided, so an agent must explore (from scratch) and learn how to maximize the likelihood of reaching commanded goals. Evaluated on simulated locomotion and manipulation tasks, our approach increases performance by $2\times$ - $50\times$. Increasing the model depth not only increases success rates but also qualitatively changes the behaviors learned.
- Europe > Poland > Masovia Province > Warsaw (0.04)
- South America > Brazil > Paraná > Curitiba (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Robots (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Reviews: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
The paper investigates the problem of expressiveness in neural networks w.r.t. The authors also show an upper bound for classification, a corollary of which is that a three hidden layer network with hidden layers of sized 2k-2k-4k can perfectly classify ImageNet. Moreover, they show that if the overall sum of hidden nodes in a ResNet is of order N/d_x, where d_x is the input dimension then again the network can perfectly realize the data. Lastly, an analysis is given showing batch SGD that is initialized close to a global minimum will come close to a point with value significantly smaller than the loss in the initialization (though a convergence guarantee could not be given). The paper is clear and easy to follow for the most part, and conveys a feeling that the authors did their best to make the analysis as thorough and exhausting as possible, providing results for various settings.
Reviews: Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
I thank the authors for their response. I understand that generalization is not the major contribution in this paper -- thanks for the note. I also appreciate the plot showing the numerical values of the weight norms for varying width. It is reassuring to know that these quantities do vary inversely with width for this setting. I think adding these sorts of plots to the appendix of the paper (with a bit more detailed experimentation and discussion) would be useful for the paper.
Reviews: Dendritic cortical microcircuits approximate the backpropagation algorithm
Using two compartments allows errors and activities to be represented within the same neuron. The overall procedure is similar to contrastive Hebbian learning and relies on weak top down feedback from an initial'self-predicting' settled state, but unlike contrastive Hebbian learning does not require separate phases. Experimental results show that the method can attain reasonable results on MNIST. Major comments: This paper presents an interesting approach to approximately implementing backpropagation that relies on a mixture of dendritic compartments and specific circuitry motifs. This is a fundamentally important topic and the results would likely be of interest to many, even if the specific hypothesis turns out to be incorrect.
Attention-based Dynamic Multilayer Graph Neural Networks for Loan Default Prediction
Zandi, Sahab, Korangi, Kamesh, Óskarsdóttir, María, Mues, Christophe, Bravo, Cristián
Whereas traditional credit scoring tends to employ only individual borrower- or loan-level predictors, it has been acknowledged for some time that connections between borrowers may result in default risk propagating over a network. In this paper, we present a model for credit risk assessment leveraging a dynamic multilayer network built from a Graph Neural Network and a Recurrent Neural Network, each layer reflecting a different source of network connection. We test our methodology in a behavioural credit scoring context using a dataset provided by U.S. mortgage financier Freddie Mac, in which different types of connections arise from the geographical location of the borrower and their choice of mortgage provider. The proposed model considers both types of connections and the evolution of these connections over time. We enhance the model by using a custom attention mechanism that weights the different time snapshots according to their importance. After testing multiple configurations, a model with GAT, LSTM, and the attention mechanism provides the best results. Empirical results demonstrate that, when it comes to predicting probability of default for the borrowers, our proposed model brings both better results and novel insights for the analysis of the importance of connections and timestamps, compared to traditional methods.
- Europe > Iceland > Capital Region > Reykjavik (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > Canada > Quebec (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Banking & Finance > Real Estate (1.00)
- Banking & Finance > Credit (1.00)
- Banking & Finance > Loans > Mortgages (0.66)
- Government > Regional Government > North America Government > United States Government (0.48)
Unsupervised learning of distributions on binary vectors using two layer networks
We study a particular type of Boltzmann machine with a bipartite graph structure called a harmo(cid:173) nium. Our interest is in using such a machine to model a probability distribution on binary input vectors. We analyze the class of probability distributions that can be modeled by such machines. We then present two learning algorithms for these machines .. The first learning algorithm is the standard gradient ascent heuristic for computing maximum likelihood estimates for the parameters (i.e. The second learning algorithm is a greedy method that creates the hidden units and computes their weights one at a time.